Recovering motifs from biased genomes: application of signal correction

نویسندگان

  • Samiul Hasan
  • Mark Schreiber
چکیده

A significant problem in biological motif analysis arises when the background symbol distribution is biased (e.g. high/low GC content in the case of DNA sequences). This can lead to overestimation of the amount of information encoded in a motif. A motif can be depicted as a signal using information theory (IT). We apply two concepts from IT, distortion and patterned interference (a type of noise), to model genomic and codon bias respectively. This modeling approach allows us to correct a raw signal to recover signals that are weakened by compositional bias. The corrected signal is more likely to be discriminated from a biased background by a macromolecule. We apply this correction technique to recover ribosome-binding site (RBS) signals from available sequenced and annotated prokaryotic genomes having diverse compositional biases. We observed that linear correction was sufficient for recovering signals even at the extremes of these biases. Further comparative genomics studies were made possible upon correction of these signals. We find that the average Euclidian distance between RBS signal frequency matrices of different genomes can be significantly reduced by using the correction technique. Within this reduced average distance, we can find examples of class-specific RBS signals. Our results have implications for motif-based prediction, particularly with regards to the estimation of reliable inter-genomic model parameters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روشی نوین در کاهش نوفه رایسین از مقدار بزرگی سیگنال دیفیوژن در تصویربرداری تشدید مغناطیسی (MRI)

The true MR signal intensity extracted from noisy MR magnitude images is biased with the Rician noise caused by noise rectification in the magnitude calculation for low intensity pixels. This noise is more problematic when a quantitative analysis is performed based on the magnitude images with low SNR(<3.0). In such cases, the received signal for both the real and imaginary components will fluc...

متن کامل

Compensation for nucleotide bias in a genome by representation as a discrete channel with noise

MOTIVATION Calculation of the information content of motifs in genomes highly biased in nucleotide composition is likely to lead to overestimates of the amount of useful information in the motif. Calculating relative information can compensate for biases, however the resulting information content is the amount seen by an observer and not by a macromolecule binding to the motif. The latter is ne...

متن کامل

Bacterial DNA uptake sequences can accumulate by molecular drive alone.

Uptake signal sequences are DNA motifs that promote DNA uptake by competent bacteria in the family Pasteurellaceae and the genus Neisseria. The genomes of these bacteria contain many copies of their canonical uptake sequence (often >100-fold overrepresentation), so the bias of the uptake machinery causes cells to prefer DNA derived from close relatives over DNA from other sources. However, the ...

متن کامل

Discovery of Stress Responsive DNA Regulatory Motifs in Arabidopsis

The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C...

متن کامل

Reliable prediction of regulator targets using 12 Drosophila genomes.

Gene expression is regulated pre- and post-transcriptionally via cis-regulatory DNA and RNA motifs. Identification of individual functional instances of such motifs in genome sequences is a major goal for inferring regulatory networks yet has been hampered due to the motifs' short lengths that lead to many chance matches and poor signal-to-noise ratios. In this paper, we develop a general metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Nucleic Acids Research

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2006